Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor

ABSTRACT

The invention relates to a domain processor (DP) comprising: at least one robot modeller (RM); at least one domain modeller (DMR); at least one Query Processor Modeller (QPM), said robot modeller (RM) comprising: means for modelling at least one computer-based robot (R); said at least one robot (R) being adapted for accessing at least one web-based data source (DS); said at least data source (DS) comprising entities comprised in a predefined domain (D); said at least one domain modeller (DMR) comprising: means for modelling at least one domain model (DM); means for establishing at least one extraction model (EM) associated with a chosen domain; means for establishing at least one storage model (STM) associated with said chosen domain, said at least one Query Processor Modeller (QPM) comprising: means for selecting at least two Query Processor elements (QPE) from a set of predefined query processor elements (QPE); means for combining at least two of the selected Query Processor elements (QPE); means for executing said associated query processor elements on at least one computer system (CS); at least one of said query processor elements (QPE) of the associated query processor elements being a Robot query processor Elemet (RQPE) adapted for accessing at least one web-based data source (DS).

[0001] The invention relates to a query processor, query processorelements and a method of establishing such a query processor and queryprocessor elements and a domain processor.

FIELD OF THE INVENTION

[0002] The invention deals with accessing, i.e. reading and/or writingin data sources associated with a certain domain. The data sources aretypically web-based which basically means that the data of the datasource are made available to the user according to a serial transferprotocol, e.g. http via the Internet. The serial transfer of the datamade available to the user is sometimes easily conceivable to a user,especially when dealing with a simple and quite specific request. Aproblem with data retrieval from web-based data sources is that the usermust typically find one or several data sources comprising the relevantdata. This search may be very time consuming and typicallynon-exhaustive due to the fact that several data sources may easily beoverlooked. Moreover, the user has to perform further queries on eachsite and these queries typically have to be made different from site tosite.

[0003] This problem has been dealt with in the prior art by applyingrobots and agents with the purpose of collecting information within acertain domain of interest and by providing these domain data or anextraction of the data to a user in a more straightforward searchableway.

[0004] A problem with the known systems applying agents is that theagents require some kind of knowledge about the data source structure,and the use of the agent requires the accept of the owner of the datasource due to the fact that an agent may dig into a data source more orless out of control.

[0005] Another problem with the known systems applying robots is alsothat the robots require some kind of knowledge about the data sourcestructure, e.g. knowledge of the structure of data containing an HTMLtable of a web-based data source, and if this knowledge is notavailable, the programming of such robot is quite difficult. Hence, theapplicable number of robots retrieving data from such data sources islimited as is the data of interest in the domain.

[0006] It is an object of the invention to provide a domain processorcapable of processing even large-scale domains.

SUMMARY OF THE INVENTION

[0007] The invention relates to domain processor (DP) according to claim1 comprising

[0008] at least one robot modeller (RM)

[0009] at least one domain modeller (DMR),

[0010] at least one Query Processor Modeller (QPM)

[0011] said robot modeller (RM) comprising

[0012] means for modelling at least one computer-based robot (R),

[0013] said at least one robot (R) being adapted for accessing at leastone web-based data source (DS),

[0014] said at least one data source (DS) comprising entities comprisedin a predefined domain (D),

[0015] said at least one domain modeller (DMR) comprising

[0016] means for modelling at least one domain model (DM) associatedwith at least one chosen domain, said domain model (DM) comprising atleast one extraction model (EM) and at least one storage model (STM),

[0017] means for establishing at least one extraction model (EM)associated with a chosen domain,

[0018] means for establishing at least one storage model (STM)associated with said chosen domain,

[0019] said at least one Query Processor Modeller (QPM) comprising

[0020] means for selecting at least two Query Processor elements (QPE)from a set of predefined query processor elements (QPE),

[0021] means for combining at least two of the selected Query Processorelements (QPE),

[0022] means for executing said associated query processor elements onat least one computer system (CS),

[0023] at least one of said query processor elements (QPE) of associatedquery processor elements being a Robot query processor Element (RQPE)adapted for accessing at least one web-based data source (DS).

[0024] When, as stated in claim 2, the domain processor (DP) comprisesat least one query processor maintenance manager (QMM), said at leastone query processor maintenance manager (QMM) comprising means forexecuting at least one query processor (QP) established by the domainprocessor, an advantageous embodiment has been obtained.

[0025] According to the invention, the domain processor mayadvantageously comprise a tool for running a query processor establishedby the domain processor. The query processor maintenance manager maythus be adapted for running the query processor on one or severalservers.

[0026] Such a manager may include a visual tool illustrating the runningstate of the query processor and the individual elements. An example ofsuch intuitive processing is that the individual elements change coloraccording to their state, e.g. within a color range from white to red,depending on the load of the elements.

[0027] Moreover, the manager should preferably illustrate basic on-offconditions visually, i.e. illustrate actively if an element is workingproperly, and whether entities are transferred between the queryprocessor elements and whether entities may actually be transferredbetween elements. The latter feature may ease operation of the systemsignificantly due to the fact that the absence of an entity flow betweenthe elements does not necessarily indicate that a fault-condition hasoccurred simply because the element is not queried.

[0028] Determination of a “clear road” between the elements may e.g. beestablished by forwarding dummy (testing) queries between elements atcertain intervals.

[0029] Moreover, the Query Processor Modeller may include submenusfacilitating specialized execution of the query processor.

[0030] Moreover, the invention relates to a robot modeller (RM)according to claim 3 comprising

[0031] means for modelling at least one computer-based robot (R),

[0032] said at least one robot (R) being adapted for accessing at leastone web-based data source (DS),

[0033] said at least one data source (DS) comprising entities comprisedin a predefined domain (D).

[0034] Moreover, the invention relates to a domain modeller (DMR)according to claim 4 comprising

[0035] means for modelling at least one domain model (DM) associatedwith at least one chosen domain, said domain model (DM) comprising atleast one extraction model (EM) and at least one storage model (STM),

[0036] means for establishing at least one extraction model (EM)associated with a chosen domain,

[0037] means for establishing at least one storage model (STM)associated with said chosen domain.

[0038] Thus, a domain model represents a structured way of definingproperties of different aspects of a domain.

[0039] A domain model may e.g. comprise an extraction model, i.e. adefinition of relevant entities and attributes to be looked for in theweb-based data source. It should be noted that the extraction model mayprimarily describe (or mask) the data source on the basis of textstrings and combinations of such strings.

[0040] A chosen domain may e.g. be “cars offered for sale”.

[0041] When the domain modeller comprises means for establishingreference mapping between extracted data obtained according to saidextraction model (EM) and a conceptual representation of said data, afurther advantageous embodiment of the invention has been obtained.

[0042] When said reference mapping defines a set of reference entitiesdescribing a number of entities (E), said entities having attributes, afurther advantageous embodiment of the invention has been obtained.

[0043] A set of reference entities may e.g. be a product catalogue.

[0044] Reference mapping may facilitate the possibility of addingknowledge to the retrieved entities. Such information may e.g. beinformation deducible from a reference product catalogue. Thus, if anentity is matched to an entity type of the product catalogue, the entitymay be modified, e.g. as a validation, corrected or inserted asadditional information about the entity.

[0045] A correction may e.g. be that one of the attributes of thePorsche retrieved above is false according to the product catalogue.This false attribute may be detected in several different ways withinthe scope of the invention. The reference product catalogue may e.g.initially reveal that no Porsche having a 3.0 liter engine has been madewith diesel engine. Moreover, the product catalogue may reveal that noPorsche has been made with a diesel engine, thereby raising theprobability that the data source provider has made a mistake. The wrongattribute “Diesel” may then be corrected.

[0046] Furthermore, the reference entities may be applied for differentvariants of classification and validation.

[0047] When the domain modeller (DMR) comprises means for establishingat least one language domain dictionary (LDD), a further advantageousembodiment of the invention has been obtained.

[0048] When said at least one language domain dictionary (LDD) maps thelanguage of the extracted entities into the general language of thequery processor (QP), a further advantageous embodiment of the inventionhas been obtained.

[0049] The general language of the query processor may e.g. be regardedas the “language” defined by an object-oriented conceptual modelassociated with the query processor. Such language may e.g. be apreferred language or coding chosen as the general language. Hence, thelanguage domain dictionary may e.g. make it possible to have an entitythat reads read “wagen” or “bil” transformed into an instance of anobject “car”.

[0050] When, said domain modeller (DMR) comprises means for establishinga set of reference recognition patterns, a further advantageousembodiment of the invention has been obtained.

[0051] The set of reference recognition patterns may e.g. comprisecharacter patterns (also known as regular expressions) or characterstructures (even pictures) to be applied when identifying attributes andentities, e.g. Ltd., Corp or A/S indicating that a company attribute orentity is associated with the character pattern in English, AmericanEnglish and Danish, respectively.

[0052] Evidently, such reference patterns will typically be domainspecific or at least language specific.

[0053] Moreover, the invention relates to a query processor modeller(QPM) comprising

[0054] means for selecting at least two Query Processor elements (QPE)from a set of predefined query processor elements (QPE),

[0055] means for combining at least two of the selected Query Processorelements (QPE),

[0056] means for executing said associated query processor elements onat least one computer system (CS),

[0057] at least one of said query processor elements (QPE) of theassociated query processor elements being a Robot query processorElement (RQPE) adapted for accessing at least one web-based data source(DS).

[0058] According to the invention, a domain-accessing system may beestablished by means of general components. Moreover, the components mayrely on general knowledge about the domain of interest, therebyfacilitating very fast establishment of domain-accessing systems.

[0059] When the Query Processor Modeller comprises a graphical userinterface (GUI) in the form of a visual programming tool, a furtheradvantageous embodiment of the invention has been obtained.

[0060] When said set of query processor elements (QPE) comprises atleast two different types of query processor elements

[0061] at least one type being a robot query processor element (RQPE)and at least one type being a trigger query processor element (TQPE), afurther advantageous embodiment of the invention has been obtained.

[0062] Moreover, the invention relates to a query processor maintenancemanager (QMM) comprising

[0063] means for executing at least one query processor (QP) establishedby the domain processor.

[0064] According to the invention, the query processor maintenancemanager should be adapted for controlling the processing of anestablished query processor.

[0065] When said maintenance manager (QMM) comprises means formonitoring the state of at least one query processor element (QPE) orthe performance of at least one query processor element (QPE), a furtheradvantageous embodiment of the invention has been obtained.

[0066] When said domain processor maintenance manager (QMM) comprisesmeans for evaluating the data flow between query processor elements(QPE) of a query processor path, a further advantageous embodiment ofthe invention has been obtained.

[0067] When said domain processor maintenance manager (QMM) comprisesmeans for running and visual monitoring of the individual modules of aquery processor, a further advantageous embodiment of the invention hasbeen obtained.

[0068] When said domain processor maintenance manager (QMM) comprisesmeans for running and visual monitoring of a query processor (QP) onelement basis, a further advantageous embodiment of the invention hasbeen obtained.

[0069] According to the invention, the elements may be advantageouslymonitored as visually separated elements.

[0070] Moreover, the invention relates to a web-robot,

[0071] said robot comprising means for extracting information fromweb-based data sources (DS) in dependency of at least one extractionmodel (EM), said at least one extraction model comprising reference datastructures defining entities and/or entity structures of data sources ina domain.

[0072] When said robot comprises at least one exchangeable plug-in, saidplug-in comprising retrieving routines adapted for reading knowledgestored in said extraction model, said knowledge preferably beingdomain-specific, a further advantageous embodiment of the invention hasbeen obtained.

[0073] When said plug-in defines reference mapping between extracteddata obtained according to said extraction model (EM) and conceptualrepresentation of said data, a further advantageous embodiment of theinvention has been obtained.

[0074] When said extraction model (EM) is shared between at least tworobots, a further advantageous embodiment of the invention has beenobtained.

[0075] Moreover, the invention relates to a query processor (QP),

[0076] said query processor (QP) comprising a set of web-based datasources (DS), wherein at least two of said data sources (DS) compriseentities according to a domain model (DM),

[0077] said query processor (QP) comprising at least three queryprocessor elements (QPE),

[0078] at least two of said query processor elements (QPE) comprising

[0079] a robot (RQPE)

[0080] said robot (RQPE) being attached to at least one data source (DS)

[0081] said robot comprising means for accessing information from the atleast one data source (DS) according to at least one extraction model(EM) associated with said robot (RQPE),

[0082] at least one of said query processor elements (QPE) comprising

[0083] a trigger (TQPE)

[0084] said trigger query processor element (TQPE) comprising means forestablishing a query.

[0085] The web-based data sources are typically independent.

[0086] The trigger element may be both manually and automaticallydriven, i.e. by a query user or an automated query routine.

[0087] When, at least one of the query processor elements (QPE)comprises a transformer query processor element (TAQPE), a messengerquery processor element (MESQPE) or a mediator query processor element(MQPE), a further advantageous embodiment of the invention has beenobtained.

[0088] Moreover, the invention relates to a method of establishing atleast one query processor (QP),

[0089] said query processor (QP) comprising a set of web-based datasources (DS), wherein at least two of said data sources (DS) compriseentities according to a domain model (DM),

[0090] said query processor (QP) comprising at least three queryprocessor elements (QPE),

[0091] at least two of said query processor elements (QPE) comprising

[0092] a robot (RQPE),

[0093] said robot comprising means for accessing information from the atleast one data source (DS) according to at least one extraction model(EM) associated with said robot (RQPE),

[0094] at least one of said query processor elements (QPE) comprising

[0095] a trigger (TQPE),

[0096] said trigger query processor element (TQPE) comprising means forestablishing a query.

[0097] said method comprising the step of

[0098] attaching at least one selected robot query processor element(RQPE) to at least one of the data sources (DS) of the domain,

[0099] combining the selected query processor elements into a queryprocessor (QP) by means of a graphical user interface (GUI).

[0100] It should be noted that the data source may both be regarded asan internal part or an external part of the query processor within thescope of the invention, depending on whether the associated data sourceis defined by its data or not.

[0101] When said graphical user interface (GUI) defines a queryprocessor element path visually on a drag- and drop basis, a furtheradvantageous embodiment of the invention has been obtained.

[0102] When at least one of the combined query processor elements (QPE)comprises a transformer query processor element (TAQPE), a messengerquery processor element (MESQPE) or a mediator query processor element(MQPE), a further advantageous embodiment of the invention has beenobtained.

[0103] Moreover, the invention relates to a method of establishing atleast one query processor (QP),

[0104] said query processor comprising means for accessing data fromweb-based data sources (DS) of a domain by means at least one userinterface (UI)

[0105] said method comprising the steps of

[0106] selecting a number of query processor element (QPE)

[0107] at least one of said selected query processor elements (QPE)being a robot query processor element (RQPE),

[0108] at least one of said selected query processor elements (QPE)being a trigger query processor element (TQPE),

[0109] attaching at least one selected robot query processor element(RQPE) to at least one of the data sources (DS) of the domain,

[0110] combining the selected query processor elements into at least onequery path defining the data flow in the query processor (QP) betweenthe user interface (UI) and the web-based data sources of the domain,said method comprising a further step of

[0111] customizing the at least one individual robot query processorelement (RQPE) to the corresponding attached data sources (DS),

[0112] customizing at least one of the trigger query processor elements(TRPE) to the query processor (QP).

[0113] When, at least one of the combined query processor elements (QPE)comprises a transformer query processor element (TAQPE), a messengerquery processor element (MESQPE) or a mediator query processor element(MQPE), a further advantageous embodiment of the invention has beenobtained.

[0114] Moreover, the invention relates to a method of extracting datafrom a web-based data source (DS), said method comprising the steps of

[0115] identifying and reading attributes and entities of a web-baseddata source,

[0116] converting the read entities into instances of conceptualentities,

[0117] verifying whether the read instances correspond with an entityreference base, (ERB).

[0118] According to the above-mentioned embodiment of the invention,very advantageous entity processing has been obtained.

[0119] A conceptual model may also include a storage database model.

[0120] When, the method comprises at least one step of verifying whetherthe read instances correspond with an entity reference base, (ERB) onthe basis of entities represented in said conceptual entity-representingformat, a further advantageous embodiment of the invention has beenobtained.

[0121] According to the invention, very advantageous processing ofentities has been obtained. Hence, a conceptual check of the data may beperformed on compact represented data, thereby reducing processingsignificantly. Hence, according to the invention, themicro-interpretation of the read entities and attributes is madeseparately, and prior to macro-interpretation of the entities.

[0122] Micro-interpretation according to the invention may be regardedas the reading of individual string-based attributes on a web-based datasource. According to the preferred embodiment of the invention, thecombination of read string-based attributes into entities may also beregarded as micro-interpretation preformed according the extractionmodel.

[0123] An example of micro-interpretation work is e.g. the job(typically performed automatically by software-based routines) ofdetermining whether a read attribute is a “Ford” or a “Fiat”. A furtherexample is the determination of whether an engine is a 75 or 155 Hpengine.

[0124] Entities held in an extraction format are typically string-based,e.g. Fiat, “Fiat”, FIAT, FIATH, etc.

[0125] Entities held in an conceptual format are typically held in anobject-like format. Hence Fiat, “Fiat”, FIAT, FIATH are all representedas a Fiat-type in the conceptual format. Such a Fiat type may typicallyinvolve an integer representation of a Fiat in old databases whereas newdatabases may represent Fiat, “Fiat”, FIAT, FIATH as a “Fiat”.

[0126] Macro-interpretation according to the invention may typically beregarded as a syntax check performed on the basis of the complete andestablished instance. Such a check may e.g. be performed with thepurpose of verifying whether the established instance of an entity isactually realistic, i.e. consistent.

[0127] Moreover, the conceptually held entities may easily be groupedand filtered and conceptual checks may evidently be performed relativelyeasily.

[0128] Conceptual representation of the entities according to theinvention is typically a object-oriented representation.

[0129] An example of macro-interpretation work is e.g. the job(typically performed automatically by software-based routines) ofdetermining whether read attributes combined into an entity “Fiat”, “120Hp” and 2.0 liter engine are actually valid. Such a check performed onthe basis of a reference base of known (valid) entity types, i.e. aproduct catalogue, may moreover be performed with the purpose of addinginformation to the checked instances of entities. Such procedure may beregarded as a deduction of information exemplified by an instance of acar, “Fiat”, “155 Hp” and 2.0 liter. When compared with a referenceproduct catalogue associated with the car domain, such a car may bededuced to be a turbo version, i.e. “Fiat”, “155 Hp”, “2.0 liter” andTURBO.

[0130] According to the invention, macro-interpretation may be performedon instances held in a conceptual format.

[0131] When modifying the verified instances according to the entityreference base (ERB) by adding information associated with saidinstances corresponding to said entity reference base, a furtheradvantageous embodiment of the invention has been obtained.

[0132] Hence, information may be added to the instances, e.g. by addingfurther attributes, or maybe modifying one or several attributes formingthe instance of an entity slightly.

[0133] An example may e.g. be the above-mentioned deduction ofinformation exemplified by an instance of a car, “Fiat”, “155 Hp” and2.0 liter. When compared with a reference product catalogue associatedwith the car domain, such a car may be deduced to be a turbo version,i.e. “Fiat”, “155 Hp”, “2.0 liter” and TURBO.

[0134] A storage model may typically be relational.

[0135] When correcting of the verified instances according to the entityreference base (ERB) by correcting information associated with saidinstances corresponding to said entity reference base, a furtheradvantageous embodiment of the invention has been obtained.

[0136] Hence, instances may be corrected, e.g. by omitting attributesheld in the instance or maybe modified by one ore several attributesforming the instance of an entity.

[0137] An example may e.g. be the above-mentioned deduction ofinformation exemplified by an instance of a car, “Fiat”, “120 Hp”, 2.0liter engine and Turbo. When compared with a reference product catalogueassociated with the car domain, the verification of the instance mayresult in a correction of the “Turbo” attribute, as the verificationprocedure may both conclude (a): no 120 HP Fiat having Turbo is in thereference catalogue (b): a 120 HP Fiat without Turbo is most likely thetrue intended instance of a car. Consequently, a correction routine maycorrect the instance accordingly or discard the entity entirely.

[0138] Moreover, the invention relates to a method of establishing aquery processor,

[0139] said query processor being adapted for accessing data on at leasttwo different web-based data sources,

[0140] selecting at least two predefined query processor elements (QPE),

[0141] combining the selected query processor elements into a desiredquery processor structure.

[0142] According to the invention, the overall structure of a queryprocessor may be purely based on some basically intended design rules,i.e. a robot element must be assigned to a data source, a trigger mustfeature a manual user interface, a database element must containretrieved database element, etc.

[0143] Such a conceptual design of a query processor should preferablybe made by means of a graphically-based visual program, e.g. a drag anddrop-like design program.

[0144] Evidently, this conceptual programming of a query processor maybe made on the basis of more or less structured knowledge about thedomain and the data sources of the domain.

[0145] Basically, such a design of a query processor represents theframework for the intended query processor.

[0146] The query processor elements basically represent differentsub-frameworks which may all be designed and performed in separatestructures or routines. Therefore, the design of query processors bymeans of different functional properties minimizes “error cross-talk”between the elements and the elements may advantageously be put togetherinitially without dealing with complicated details of the individualelements.

[0147] A query processor according to the invention is established foraccessing data of at least two different independent web-based datasources.

[0148] A further advantage of the above-mentioned method is that abreak-down of the functional features of a query processor intostandardized elements, which may be configurable, may easily beconceived by a programmer.

[0149] A further advantage of the invention is that utilization ofstandardized elements facilitates the possibility of pre-configuringdifferent variants of a certain element type, thereby offering thepossibility of inserting a pre-configured element to the user.

[0150] An example of such pre-configuration of elements may e.g. be atrigger element. Within the (type) group of trigger elements, severalvariants may be pre-established with great advantage if such triggerelements are utilized often. Therefore, a programmer may e.g. apply atrigger element predefined for trigging a query at certain timeintervals. Other types of trigger elements may e.g. be triggerscomprising a statistic module applicable for trigging a query accordingto different system parameters. A third possible type of triggers maye.g. be a manually operated trigger intended for establishment of aquery in corporation with a manually operated user interface.

[0151] Basically, the invention offers a high-level languagefacilitating easy web-based access.

[0152] When said at least two predefined query processor elements havedifferent functional characteristics, an advantageous embodiment of theinvention has been obtained.

[0153] Different functional characteristics may e.g. be elementsfunctioning as converters, triggers, caches, robots.

[0154] Hence, a query processor according to the invention may beestablished by means of standardized “bricks”, thereby doing away withthe establishment of a web-oriented query processor being extremelycomplicated.

[0155] When modifying the selected query processor elements according tothe data structure of said web-based data sources, a furtheradvantageous embodiment of the invention has been obtained.

[0156] According to the invention, the different elements may beconfigured or designed independently. Hence, the individual elements maybe established so as to fit the individual task(s) of the elementswithout inducing errors somewhere else in the processing system.

[0157] When said modification of the selected query processor elementscomprises at least one plug-in software module, said at least oneplug-in defining domain-specific properties of said element, a furtheradvantageous embodiment of the invention has been obtained.

[0158] Hence, domain-specific plug-ins may initially be constructed,e.g. product catalogues, language dictionaries, as completely separateroutines. Moreover, the individual elements may be ideally constructed,e.g. a robot, with no or only little knowledge of the language of thedata source due to the fact that the basic structure and functioning ofthe robot is language independent. Product catalogues should likewise bedomain specific.

[0159] Moreover, the individual elements may be established withdifferent plug-ins.

[0160] Moreover, the invention relates to a method of establishing adomain-accessing routine,

[0161] said domain comprising a plurality of web-based data sources,

[0162] said method comprising the steps of

[0163] establishing at least one robot ( ) adapted for retrievingentities stored on said plurality of web-based data sources,

[0164] establishing at least one reference catalogue,

[0165] establishing at least one procedure of verifying the retrievedentities by comparing the read entities with the at least one referencecatalogue.

[0166] Thereby, an ideal way of retrieving information from a web-baseddata source has been obtained.

[0167] When said method comprising the steps of

[0168] establishing at least one storage means

[0169] establishing a data-exchanging interface between said at leastone robot and at least one storage means, a further advantageousembodiment of the invention has been obtained.

[0170] When said reference catalogue is a product catalogue, a furtheradvantageous embodiment of the invention has been obtained.

[0171] When said established procedure of verification comprises amodification of the retrieved entities if the verification procedureindicates or proves that a read entity is not valid according to the atleast one reference catalogue, a further advantageous embodiment of theinvention has been obtained.

[0172] Moreover, the invention relates to a query processor maintenancemanager (QMM)

[0173] comprising at least one domain processor user interface (DPUI)

[0174] said manager (QMM) comprising means for evaluating differentmodules of at least one query processor (QP),

[0175] said means for evaluating different subroutines of said queryprocessor comprising

[0176] means for monitoring the state of at least on query processorelement (QPE).

[0177] Hence, the query processor may comprise means for monitoring atthe robot element, a transformer element, a trigger element, a mediatoretc.

[0178] When said processor comprises means for automatically forwardingmessages to said at least one query processor user interface (DPUI) whencertain predefined conditions are met, a further advantageous embodimentof the invention has been obtained.

[0179] The predefined conditions may e.g. be conditions determining thata transformer has failed to transform extracted entities into conceptualentities.

[0180] A further predefined condition may be that a maximum load of anelement, e.g. a cache or a robot, has been exceeded.

[0181] When said manager (QMM) comprises means for modifying individualquery processor elements/sub-routines, a further advantageous embodimentof the invention has been obtained.

[0182] The means for modifying individual query processorelements/sub-routines may e.g. comprise an editor for the robots ormeans for modifying plug-ins centrally.

[0183] An example of such an editor may e.g. be the interface of a QueryProcessor Modeller in which the individual query processor elements maybe edited simply by clicking on the elements and thereby starting theeditor related to the activated element. Such an editor may e.g. be aRobotmaker, if a robot is clicked on, or a domain modeller if atransformer element is clicked on.

[0184] When said manager (QMM) comprises means for modifying the queryflow in the query processor during execution of the query processor, afurther advantageous embodiment of the invention has been obtained.

[0185] When allowing realtime editing in the query processor, theup-time of the query processor may be maximized. This realtime editorshould preferably comprise means for blocking differing query paths ofthe query processor without invoking fault conditions on the associatedsignal paths.

[0186] An example of means for modifying the query flow may e.g.comprise a mute element included in a query path. The activation of sucha mute element may then cause the involved branch to be out of work,whereas the rest of the query processor may proceed unaffectedly,insofar that queries or entities (i.e. data) from the muted branch aresignificant to proceeding the query. Typically, the queries and entitiesmissing from one branch of the query processor subroutine may bepreferable over closing the complete query processor down.

[0187] Meanwhile, the elements of the muted branch, e.g. a robot or atransformer, may be “repaired” or updated without resulting in run-timeerrors.

[0188] A further advantageous variant of the above-mentionedmodification may be a halt routine acting as the above-mentioned mutebut including a memory which may catch and store queries, andsubsequently resume processing by means of the cache and stored queries.

THE FIGURES

[0189] The invention will be described below with reference to thedrawings of which

[0190]FIG. 1 illustrates some basic principles of a query processorsystem,

[0191]FIG. 2 illustrates a basic approach according to the inventionwhen dealing with domain processing,

[0192]FIG. 3 illustrates the process of establishing a domain processoraccording to a preferred embodiment of the invention,

[0193] FIGS. 4 to 6 illustrate the principles of one embodiment of adomain modeller according to one embodiment of the invention,

[0194]FIG. 7 illustrates the principles of an applicable robot-makingprogram according to one embodiment of the invention,

[0195]FIG. 8 illustrates the functionality of a query processor modelleraccording to one embodiment of the invention and

[0196]FIG. 9 illustrates a possible user interface of a domain executionmanager.

DETAILED DESCRIPTION

[0197]FIG. 1a illustrates the basic principles of a web-based marketplace.

[0198] A web-based market place generally comprises a number ofweb-based data sources DS. The data sources are e.g. web-sitesassociated with a homepage of a data source owner. Typically, the dataare transferred according to a HTTP protocol. Other protocols, e.g. WAPprotocol or HTTPS are also applicable.

[0199] The data sources DS are typically a database or they are poweredby a database DB of the data owner.

[0200] It should be noted that a marketplace may moreover comprisenon-web based data sources accessed by means of e.g. ODBC drivers.

[0201] The data sources offer information, products, services, etc. freeor for sale.

[0202] According to the invention, a market place should technicallydeal with one domain only, but evidently, several domains may beoverlaid and thereby offer a market place dealing with differentdomains.

[0203] An example of such domain may e.g. be a car market place. Thecars of the domain are offered for sale on the individual web-based datasources DS, and the cars may be new or used. A domain may includedifferent nationalities of data sources and be in many languages. On theother hand, a car market place offering used cars would typically onlycomprise cars offered for sale in one country.

[0204] Other exemplary domains may be jobs, services, stocks, odds,boats etc.

[0205] It should be noted that web-based access to the data sourcesfacilitates a very broad covering of the entire domain due to the factthat web-based data sources may be accessed without any kind ofcorporation between the accessing part and the data source owner.Typically, the data sources will be independent.

[0206] According to the invention, the content of the data source of thedomain will be regarded as entities. An entity has different properties,here defined as attributes.

[0207] An example of an entity is a specific car offered for sale, e.g.a Porsche, and attributes may be color, e.g. black, engine, e.g. 3.0liters, etc.

[0208] Another example of an entity is a specific boat described by anumber of suitable boat-describing parameters, or attributes, such aslength, price, year, etc.

[0209] When reading a web-based data source DS, a combination ofattributes will typically be read and interpreted as a car. Such readingof attributes may be regarded as an extraction of information from theweb-based data source according to the invention.

[0210] The data sources DS may be accessed both by reading and/orwriting.

[0211] The data sources may be accessed via a domain handling system,i.e. a processing system, implemented by software in hardware on theillustrated computer system CS. The computer system may comprise onecentral server or a number of coupled servers located centrally ordecentrally. Such system may be regarded as a query processor QP. Thequery processor is adapted for querying the data sources automaticallyor upon request, a query Q, made by a user U. The request is performedby means of a user interface implemented on a user platform UPF.

[0212] As illustrated in FIG. 1b, a User Platform UPF typicallycomprises a computer-based user interface which may be manually operatedby a user U.

[0213] Hence, a user may forward a query Q to the data sources DS viathe query processor QP. The query may be processed in many steps and thequery processor QP may also include a data cache or a database forstoring entities retrieved from the data sources DS for statisticalpurposes or for speeding up the query process.

[0214] The individual web-based data sources are accessed (i.e.: readand/or write) by means of robots attached to the data sources.Typically, one robot is uniquely to a corresponding data source DS.

[0215] The definition of a robot differs significantly from the somewhatpopular definitions and the more scientific definitions.

[0216] The definition adapted in this application is that a robot is akind of automatic process established with the purpose of accessingweb-based data. A robot is a sub-arrangement of a so-called agent.

[0217] According to the invention, a robot is a software-based automaticprocess established with the purpose of accessing web-based datasources. According to the invention, a robot may even comprise some kindof intelligence embedded in the process establishing elements. It shouldbe noted that a robot according to this definition may even be regardedas an agent by some practitioners within the art.

[0218] According to the invention, the agent has no personality, and itis not autonomous, nor mobile, in the sense that the agent is free to betransferred and processed on the local data source servers of the datasource owners. A robot according to the invention is established forremote execution in relation to the data sources to be accessed and therobots will only be executed in a particular server environment. Itshould be noted that this particular environment may obviously includeseveral servers located at different places.

[0219] Again, it should be noted that non-web-based data sources may beadded if desired.

[0220]FIG. 1c illustrates the complex nature of a data source to beaccessed according to the invention. The illustrated data source DS hasa data structure which is initially unrevealed and incompatible with theaccess tools of the retrieving profile associated with the specific datasource DS.

[0221] According to the illustrated embodiment, the character-basedinformation of the data source DS has been converted into a number ofattributes of identified text strings. Evidently, attributes may beencoded and decoded in various formats such as character based formats,image based formats and active content formats, such as Java applet,JavaScript application or VB script application.

[0222] The text strings may e.g. be a mix of text strings identifyingcar names, model names, numbers, etc.

[0223] Subsequently, the data source must be evaluated and interpretedaccording to an extraction model in order to facilitate access to hiddeninformation by the retrieving profile RP.

[0224]FIG. 1d illustrates identification and categorization ofattributes of a data source according to the invention.

[0225] The attributes, i.e. the text strings of the data source, maysubsequently be interpreted and combined into so-called entities ofassociated attributes ASA. The associated attributes may be establishedso as to comprise certain predefined types of attributes, i.e.categorized attributes.

[0226] An example of an entity is a car entity comprising thecategorized attributes CA “Trabant”, '88 and $100,000 where the firstattribute of the category is car model, the second attribute of thecategory is manufacturing year and the third attribute of the categoryis the price. The above-mentioned entity may also be referred to as aninstance of an extraction model. The extraction model defines anddescribes certain attributes and entities of interest for the domain.

[0227] Each entity is established as a set of associated attributes ASAand the irrelevant attributes are filtered away.

[0228] Evidently, the establishment of entities of associated attributesmay be performed in several different ways, and more or lessautomatically, within the scope of the invention. It should be notedthat the preferred embodiment of the invention implies a completelyautomatic establishment of as many robots as possible.

[0229] A detailed description of a semi-automatic robot establishmentaccording to one embodiment of the invention is described with referenceto FIGS. 7 to 9.

[0230] Subsequently, the identified entities may be copied into thecentral database DB means in such a way that the retrieving profileinitially performs a query in the database instead of visiting everyinvolved data source DS and lists the results to the user according to apredefined listing format. This feature ensures quick access to thesearch result. If the user U requires additional information, thisinformation may be obtained by means of a link contained in theabove-mentioned result list.

[0231] When the entities have been copied to the database and associatedwith the retrieving profile, further information is added to theretrieving profile in the form of a robot adapted to the data structureof the specific data source. This robot is associated with theretrieving profile in order to visit the data source according tocertain trigger criteria and to reevaluate the data source in orderdetermine whether the contents of the data source have been changed.Hence, the robot will access the data source e.g. at certain intervalsand update the contents of the database if changes have occurred. Suchan automatically handled change may take place if e.g. one entity hasbeen removed from the data source and replaced by two other entitieswhen the removed entity represents a sold car and the two new entitiesrepresent cars introduced for sale.

[0232] Such a change observed by the robot should of course be reflectedin the database, as the sold car has to be removed and the two cars beadded to the database in order to reflect the state of the data sourcewhen the data source is visited.

[0233] A change may likewise be stored and registered for statisticpurposes in another database.

[0234] If, on the other hand, the data structure of the data source haschanged in such a way that the robot is no longer able to extract thecorrect information, an error is reported to the retrieving profile.Such an error results in the establishment of a new robot fitting thenew structure of the data source.

[0235] It should be noted that each data source typically requires adedicated robot.

[0236]FIG. 2 illustrates three entity models applied in a preferredembodiment of the invention.

[0237] The three entity models are an extraction model EM, a conceptualmodel CM and a storage model STM.

[0238] For reasons of simplicity, entities according to the three modelsare referred to as extraction entities EENT, conceptual entities CENTand storage entities SENT. The entities are also referred to in threedifferent formats, i.e. an extraction format, a conceptual format and astorage format.

[0239] The entity flow is transformed between the different formats bymeans of converters established for converting the data from one formatinto another. According to the invention, the converters may preferablybe established as so-called transformer elements which will be dealtwith in detail below.

[0240] Starting from the web-based data source end, upstream, theentities are accessed according to an extraction model preferably commonfor all involved data sources of the domain. The extraction entitiessimply comprise a serial stream of strings. According to the extractionmodel, the strings are ordered in such a way that the receiver of thestring-stream may recognize what the transmitter actually intends totransmit. This may be established both with accompanying codes or simplyas a convention defining the sequence.

[0241] In fact, the extraction model represents more than a data format.It also defines the different attributes which the robots should accesswhen dealing with the different data sources. In other words, theextraction model represents a framework in which the designers maydesign the robots. The robot designers may therefore concentrate fullyon designing a robot capable of accessing the attributes contained inthe extraction model and on combining the attributes into entitiesaccording to the extraction model, i.e. extraction entities.

[0242] The extraction entities may nevertheless be established e.g.wholly or partly by automated extraction routines. In a certainweb-based data source, such routines may e.g. be adapted for automaticreading the data source representation, automatic recognition ofattribute patterns of the web-based data source, and outputting of theseattributes as extraction entities according to the extraction model.

[0243] Moreover, such automated routines may evidently be adapted forassigning the specifically discovered attribute/entity patterns of adata source to a corresponding robot.

[0244] According to the preferred embodiment of the invention, theextraction model may be established by means of a domain modeller DMR.

[0245] The extraction entities may then be converted, e.g. by atransformer, into conceptual entities. Among other things, theconceptual model representation of an entity involves a conversion ofthe individual entity into a unique object. In a simplified manner, anextraction entity comprising a string stream of “Porsche”, “Red”, “3.0”,“Diesel” is converted into a unique car object, a conceptual entity,being a Porsche which is red and with a 3.0 liter diesel engine.

[0246] The conceptual format moreover offers the possibility of handlingthe entities in a compact way. Now, the entities may be represented inan object-oriented manner instead of a flat string format.

[0247] Moreover, a conceptual approach to the entities offers thepossibility of adding knowledge to the retrieved entities. Suchinformation may e.g. be information deducible from a reference productcatalogue. Thus, if an entity is matched with an entity type of theproduct catalogue, the entity may be modified, e.g. as a validation, acorrection or as an insertion of additional information about theentity.

[0248] A correction may e.g. be that one of the attributes of thePorsche retrieved above is false according to the product catalogue.This false attribute may be detected in several different ways withinthe scope of the invention. The reference product catalogue may e.g.initially reveal that no Porsche having a 3.0 liter engine has been madewith a diesel engine. Moreover, the product catalogue may reveal that noPorsche has been made with a diesel engine, thereby raising theprobability that the data source provider has made a mistake. The wrongattribute “Diesel” may then be corrected.

[0249] Insertion of added information may e.g. be that the recognitionof a Porsche of the above-mentioned type (now assuming that the dieselstatement has not been made) has electronic injection. This informationmay then be inserted as a new attribute to the unique conceptual entityPorsche or in the fill-in of a text field attribute of the Porsche.

[0250] Validation comprises the step of evaluating whether the currentlyinvestigated conceptual entity should be regarded as a valid entity atall. Such validation may basically result in the fact that the entity isaccepted as a valid entity or that the entity is discarded.Subsequently, a valid entity may be further processed with the purposeof deducing information about the entity described above.

[0251] A discarded entity may result in a further investigation of theoriginal data source with the purpose of evaluating whether an entityhas been overlooked. Evidently, a realtime evaluation of the discardrate of each data source should be performed with the purpose ofmonitoring whether the robot or the extraction model associated with theindividual data source needs an update or replacement.

[0252] Typically, every possible attribute of a conceptual entity shouldbe predefined in the conceptual model. According to a preferredembodiment of the invention, the conceptual entities and attributesshould be established by means of a domain modeller.

[0253] The conceptual model should typically be made by people having acertain kind of knowledge about the domain. It should, nevertheless, beemphasized that the establishment of relevant attributes may be heavilysupported by automated procedures traversing trough the domain andidentifying the offered combinations of attributes.

[0254] The last entity model is the storage model. The storage model isprimarily adapted for applying traditional database structures anddatabase handling methods to the retrieved entities. Thus, the modelingof a storage model may be performed with very little knowledge of thenature of the domain but more or less by focussing on the involvedattributes and entities.

[0255] Evidently, other entity format approaches may be applied withinthe scope of the invention. Specifically, the distinction between thedifferent models may be softened up a little in the sense that theconceptual model and the data storage model may more or less beincorporated in one body.

[0256] Evidently, the invention features the possibility of performingcentralized processing when data retrieved from the different datasources are represented according to a generalized entity model, e.g. aconceptual model.

[0257] The extraction format may be understood as an analogue formatwhile the conceptual/storage format may be regarded as a digital format.

[0258] The extraction entities are typically entities extracted directlyfrom the web-based data sources, the conceptual entities are typicallythe entities flowing in the heart of the query processor capable of morecomplex processing, and the storage entities are typically the entitiesrepresented in e.g. a relational database.

[0259] It should be emphasized that the different models, e.g. theabove-mentioned extraction model EM, conceptual model CM and storagemodel STM may facilitate an entity flow both ways; downstream asdescribed above from the data sources to the user querying the queryprocessor, or upstream from a user submitting an entity or a request,e.g, an order to a certain data source.

[0260] If, for instance, a user wants to buy an item found in thedomain, he may then submit an order associated with a chosen entity,e.g. a PC, car, etc. This order would comprise the selected item as astorage or conceptual entity which is subsequently converted in thequery processor and submitted to the relevant data source according tothe extraction model. An extraction model according to the invention maythus both be defined as a way of reading the data source and it may bedefined as a way of writing (submitting) entities into the data source,e.g. by means of a form into a shopping cart of the data source or adata search form associated with the relevant data source.

[0261] Preferably, the two functions, reading and writing, should besupported by two separate distinct models for the purpose of clarity,i.e. one model for reading the data source, an extraction model, and onemodel for writing to a data source, a submission model.

[0262] The first format, the extraction format, is the format in whichthe entities are accessed in the web-based data source. This format isevidently a little fragile and unhandy due to the fact that thisstring-based entity stream is primarily based on transmission of datasupposed to be entities and attributes of entities. This fragileextraction format may typically not be supported significantly byvalidity checks due to the fact that the extracted entities aredifficult to process on a large scale. Such processing would involvemajor complex string-based processing.

[0263] The conceptual format is established on the basis of thepredefined conceptual model defining the basic nature of the entities ofthe domain. The conceptual representation may fundamentally be regardedas an object-oriented representation of the read entities. A conceptualrepresentation of the read entities is relatively easy to process in thesense that the entities are converted into unique instances of theconceptual model, thereby offering filtering, conversion or modificationof any information related to the individual instances of predefinedinformation, e.g. attributes, types of attributes etc. consistent withthe conceptual model.

[0264] The storage format is basically intended for storing theretrieved entities for later access. The storage format represents amore handy representation of the retrieved entities of the domain in thesense that superfluous information, e.g. information contained in orrelated to the conceptual model may be omitted. Such information maye.g. be entity information utilized for converting the extractionentities into conceptual entities. Such information need no longer bepresent in the storage model as the entities are now conceived as uniqueentities.

[0265] The entities stored in a database according to the storage modelmay (and should) instead be used for statistical purposes.

[0266] The conceptual model and the storage model may be more or lessoverlapping but, preferably, these formats should be dealt withseparately, thereby obtaining the possibility of reusing the storagemodel and even the conceptual model in other applications. Moreover, thestrict separation between the applied data models facilitate theindividual models to be modified individually without consideringinteraction with the other models under some circumstances. An exampleof such a simple modification of a model is the modification of aclassification module which may basically be established without anymodification of other modules as long as no new entity attributes havebeen introduced or removed.

[0267] A part of the extraction model may be global or at least multiplein the sense that this part of the model may contain general plug-ins ofthe extraction model applicable for many or all data sources to beaccessed. An example of such general plug-ins may e.g. be a languagedictionary defining different applicable languages, e.g. English,Japanese, French or Danish. Moreover, the language dictionary maycontain a domain-specific dictionary focussing on the entitiescharacterizing the domain.

[0268]FIG. 3 illustrates the process of establishing a domain processoraccording to a preferred embodiment of the invention.

[0269] It should be noted that the establishment of the components andlogistics needed for collecting data from a domain and the maintenanceof the components may be performed in other ways within the scope of theinvention.

[0270] Initially, the main steps to be introduced below with referenceto FIG. 3 will be described shortly. A throughout discussion of thesteps and the meaning of these steps will made below with reference tothe subsequent figures.

[0271] Initially, it has been decided that a new domain must beestablished. This domain may e.g. be a domain comprising boats offeredfor sale which are either used or new.

[0272] The boats are offered for sale from different web-based marketplaces, typically the homepage of a dealer or e.g. private homepages.

[0273] As discussed later, web-based data sources may be supplemented bye.g. direct reading in a dealer's database, e.g. by means of ODBC basedreading. Nevertheless, the domain should basically always be located inat least two different web-based data sources.

[0274] Moreover, the web-based data source may typically be accessedwithout the consent or knowledge of the web-based data source owner.Consequently, there are no strict sign-up requirements by the datasource owner. Therefore, the data fundament of the domain is huge,insofar it more or less includes all entities offered for sale in thecomplete worldwide web.

[0275] The decision that a new domain ND has to be made initiallyinvokes the Domain modeller DMR to establish the characteristics of thedomain. These characteristics are to be used when establishing thedifferent technical measures needed for accessing the web-based datasources. Details of the functioning of the very important DomainModeller DMR will be discussed later. It should be noted that the domainmodeller may operate more or less automatically.

[0276] According to a preferred embodiment of the invention, the domainmodeller DMR outputs a specific Domain model DM needed for the differentsoftware modules, also named elements, to be used when establishing thequery processor for the domain. Hence, the elements described at a laterpoint may advantageously utilize the domain model DM for different oroverlapping purposes. The domain model DM may comprise a knowledge basedescribing different general features and aspects of the invention so tospeak. Such a general knowledge “container” benefits from the fact thatthe knowledge describing the domain may be established centrally andthereby obtain a compact knowledge structure which may be modifiedcentrally and basically without dealing with complicated details of thedifferent query processor elements. Therefore, the domain modelrepresents a knowledge structure that may be accessed by the differentquery processor elements simply by defining a so-called plug-in to theindividual or some of the query processor elements. The plug-in mayrepresent a domain reading structure, e.g. JAVA-code, adapted forreading a certain part of the domain suitable for the establishment andfunctioning of the element. Therefore, different elements may utilizedifferent parts of the knowledge. Moreover, the centrally organizedknowledge may be modified centrally, thereby inferring that all elementsautomatically utilize an updated knowledge base with little or typicallyno modification of the elements or the plug-ins.

[0277] According to the invention, some general knowledge may evidentlybe decentralized, i.e. put into the individual query processor elements.However, according to a preferred embodiment of the invention, thecentral knowledge base, or the domain model DM, should be maximized.

[0278] A domain model DM may e.g. comprise a reference product cataloguedescribing all known products of the domain, e.g. a list of differentknown car models and variants of such models.

[0279] Furthermore, the domain model DM may comprise mappings betweendifferent entity models applied by the query processor, e.g. conversionmappings between extraction entities, conceptual entities and storageentities.

[0280] Furthermore, the domain model may e.g. comprise the extraction,conceptual and storage models.

[0281] Also, the domain model may comprise language dictionaries, bothdomain-specific and more general dictionaries.

[0282] By applying a domain model, a change in the domain model may bereflected uniformly in the complete query processor.

[0283] The next step, Create Query Processor CQP, initiates thecombination of different elements by means of a Query Processor ModellerQPM. Some of the elements combined by the Query processor Modeller QPMare established by the domain modeller DMR and some of the componentsare general preestablished elements.

[0284] Other elements to be used may e.g. be robots intended foraccessing the data of the individual sites.

[0285] The next step, Create Accessors CA, initiates the assignment ofindividual robots to specific data sources of the domain. A detaileddescription of such a robot-generating program may be found inPCT/DK00/00163 and PCT/DK00/00429 filed by the applicant and is herebyincorporated by reference.

[0286] The last step, Maintenance, involves the establishment ofdifferent procedures intended for maintaining the query processor. Suchprocedures may e.g. be establishment of a robot and system monitoring.Such monitoring may e.g. include the monitoring of the load of thesoftware elements/modules and whether the robots actually fit the sites,etc.

[0287] Moreover, such procedures may include modifying or exchangingrobots if such actions are considered necessary.

[0288] Evidently, the chronology of the above-mentioned steps may bemodified within the scope of the invention, e.g. by establishing therobots before the query processor is combined in the Create QueryProcessor step.

[0289] FIGS. 4 to 6 illustrate the principles of a domain modelleraccording to one embodiment of the invention.

[0290] Evidently, the user interface providing the domain modellingfeatures to the user may be established in numerous variants within thescope of the invention.

[0291] According to the illustrated embodiment, the relations betweenthe table of the database are made in a selectable “edit” environment.Evidently, a combined view/edit environment is applicable within thescope of the invention.

[0292] The illustrated domain modeller comprises an interface having amenu bar comprising four different selectable menus File, Edit, View andMapping.

[0293]FIG. 4a illustrates that the menu View has been selected. The Viewmenu, which is a Relationships Window, may comprise several menu items:Storage model, Extraction model, Conceptual model and Submission model.The models define the different entity models adapted by the completequery processor. Evidently, different kinds of entity models anddefinitions of entity models may be adapted within the scope of theinvention.

[0294] The term database model may also be referred to as a storagemodel.

[0295] In FIG. 4a the Database model view has been selected.

[0296] The view area VA appearing when selecting Storage Model Viewillustrates the basic components of the database attached to the domainby means of visual indications of relations between the tables. Thedatabase model defines the structure of a database intended for storageand handling of the entities of the domain. A database model istypically a relational database rather than a flat-file database inorder to accommodate the knowledge obtained by the query processor.

[0297] The Relationships window may be in different “showrelationships”— modes, e.g. “Show All Relationships” or “Show DirectRelationships”.

[0298] The first mode shows all tables of the current database. Theother mode shows the tables of the database within the currentlyselected domain. When selecting the available tables, the viewer willshow the relationships to all tables related directly to the selectedtable.

[0299] Basically, this viewing area VA may operate like knownvisualizing tools adapted for viewing relations between tables ofrelational databases.

[0300] According to the illustrated embodiment, the viewer is in thesecond mode. An open domain model intended for attachment to a PCdistributing domain comprises a PC Equipment table PCE. The illustratedPCE table comprises an ID, DealerID, ProdID and Price. The first is aprimary key to the PCE-table, while DealerID and ProdID are foreign keysto the tables DCAT and PCAT, respectively.

[0301] The PCE table refers to a product catalogue PCAT and a dealer'scatalogue DCAT. The product catalogue PCAT is a table of the productsattached to the domain and intended for sale. The dealer's catalogueDCAT is a table of the dealers attached to the domain. Finally, the PCEtable refers to price.

[0302] Evidently, such a PCE table would typically be more complex, e.g.comprising relations of tables comprising further productcharacteristics such as color, comments to the products, currency, URLetc.

[0303] When double-clicking on the Price field of the PCE table, thePrice field definitions appear as a dialogue box PD. This field may beapplied for defining the Price field. The illustrated Price field hasthe name “Price” and the field type may be selected as a string or aninteger, here selected as an integer.

[0304]FIG. 4b illustrates that the menu Mapping has been selected. TheMapping menu, which is a table or Relationships Window, may compriseseveral menu items, e.g. the illustrated EM to CM, CM to STM, STM to CMor CM to SM.

[0305] The first-mentioned mappings, EM to CM and CM to STM, deal withmappings needed for retrieval of entities from a data source, while thetwo latter deal with writing, i.e. submission to a data source (e.g.filling-in of a form in a data source to place an order, filling-in of asearch form or e.g. insertion of a new entity in the data source.

[0306] The EM to CM, Extraction model to Conceptual model mapping,defines the mapping between the entities and/or attributes retrievedaccording to the extraction model EM into entities and/or attributesaccording to a conceptual model CM.

[0307] The CM to STM, Conceptual model mapping, defines the mappingbetween the entities and/or attributes held according to the conceptualmodel CM into entities and/or attributes according to a storage modelSTM.

[0308] The STM to CM, Storage model to Conceptual model mapping, definesthe mapping between the entities and/or attributes represented accordingto the storage model STM into entities and/or attributes according toconceptual model CM.

[0309] The CM to SM, Conceptual model to Submission model mapping,defines the mapping between the entities and/or attributes representedaccording to the conceptual model CM into entities and/or attributesaccording to a submission model SM.

[0310] Evidently, the mapping from one model to another may be performedin several other ways than the table-based method illustrated in FIG. 4bwithin the scope of the invention.

[0311] Thus, the mapping may include direct transformation of a numberof associated attributes into a unique object in a relational manner.That is; the bundle of associated extractions is transformed as a wholeinto one unique object instead of applying the above-mentioned method ofinitially mapping the extraction attributes into conceptual attributes,and then subsequently establish a unique entity on the basis of areference system, e.g. a product catalog defining different possibleentities of the domain.

[0312] The mapping from the extraction model to the conceptual modelpreferably involves a classifier (i.e. a classification system) thatwill map extracted entities into conceptual entities according to aproduct catalogue. That is; the product catalogue may contain various(generic) conceptual entities existing in the domain. Afterclassification, if a

[0313] classifier is at all available in the domain, the conceptualentities are made unique according to the extracted entities bytransferring various attribute values from the extracted entities to theconceptual entities, such as price, URL, currency etc. This transfer ofvalues from extraction entities to conceptual entities is done byselecting and configuring a transfer function that maps one or moreextraction model attribute values into one or more conceptual modelattribute values.

[0314] In FIG. 4b, the EM to CM has been selected.

[0315] The view area appearing when selecting EM to CM attributesillustrates the attributes to be converted into conceptual entities,e.g. in the form of a table.

[0316] In FIG. 4b, the extraction attribute “Make” has been selected,thereby opening a mapping table where EM-CA A has been selected. Thetable comprises different applicable mappings between extractionattributes to conceptual attributes, here exemplified by the stringsFerrari, Fiat and Ford converted into integers 17, 18 and 19,respectively.

[0317]FIG. 5 illustrates that the PCE table has been double-clicked. APCE dialogue box appears PCED. This dialogue box facilitates editing ofthe PCE table defining data, e.g. by insertion of SQL-statementsassociated with the PCE table, attribute names, etc. Finally, the tablemay be generated by selecting the Table Generate tag, TAG.

[0318] Basically, the storage model may be modeled by known prior artdatabase-generating tools. The important thing when dealing with thedatabase model for the specific domain is to include all necessaryattributes and establish an well-structured, easily searchable andquickly accessible database. It should be noted that this structuring ofthe domain database may be performed independently of the rest of thedomain query processor, as long as the necessary entity attributes havebeen defined.

[0319]FIG. 6 illustrates the Domain Modellers Extraction model viewer.

[0320] In FIG. 6, the Domain Modellers Extraction model viewer has beenselected.

[0321] While the database in the the database modeller viewer may beregarded as the representation of entities “understood” by the queryprocessor, the domain extraction model to be made by the extractionmodeller may be regarded as the definition of relevant attributesincluded in the syntax of “raw” string-based data of the web-based datasources to be accessed as defined by the data source provider.

[0322] Robotmaker

[0323]FIG. 7 illustrates the principles of an applicablerobot-establishing program according to one embodiment of the invention.

[0324] Evidently, the robots to be used in the query processor may beestablished and attached to a certain data source in many ways withinthe scope of the invention.

[0325] The main principles of the robot generator mentioned below is tomake a robot and assign it to a certain site containing data relevant tothe domain of interest, i.e. assign the robot to the site by means of anaddress, e.g. URL address, and generate a data reader (the robot)capable of reading the data of interest contained in the data source,e.g. a web-site, and transfer these data in a certain data format to thecentral control of a query processor in response to a query.

[0326] Hence, according to a preferred embodiment of the invention, anew and unique robot has to be made for each web-based data source to bequeried.

[0327] Turning now to FIG. 7, a short overview of this program will bedescribed.

[0328] A detailed description of such a robot-generating program may befound in PCT/DK00/00163 and PCT/DK00/00429 filed by the applicant and ishereby incorporated by reference.

[0329] The nodes may be arranged in straight-forward paths. However, thenodes are typically arranged in branched IF-THEN paths.

[0330] The robot generating program is adapted for establishingsequential access of a web-based data source. The control of thissequential reading is e.g. established by means of a graphical path ofnode processors NP, each node processor NP performing some configurableprocessing of its input. The nodes are sequenced in such a manner that aweb-based data source, e.g. in HTML, may be traversed and data extractedor submitted. It should be noted that high-volume establishment of suchrobots is somewhat time-consuming. Hence, the robot-generating programsshould be very user friendly or even automatic.

[0331] A nodeprocessor selector NPS is adapted for configuration to thecurrent application in the node processor configuration view NPC.Moreover, the nodeprocessor may be attached to a certain document areaby means of a document range definer DRD.

[0332] Finally, the robot maker viewer comprises a document view whiche.g. may be adapted for viewing the XML text of the data source or apart of the data source.

[0333] Basically, the robot maker outputs robots and each robot isspecialized in operating one dedicated web-based data source.

[0334] According to the preferred embodiment of the invention, the robotoutputs entities according to the extraction model(s), i.e.non-classified or interpreted data, to a central control, e.g. to atransformer query processor element. Here, the extracted strings may beconverted into coded representations, e.g. as objects stored in adatabase, and the extracted data may then be classified.

[0335] Evidently, according to additional/other embodiments of theinvention, the established robots may contain transforming means fortransformation of extracted data into a conceptual representation, e.g.conversion of a sequence of strings “Ford”,“2.0”,“red” into an objectstored in a database as a “car”, which is a red Ford having a 2.0 literengine. It should be noted that the preferred embodiments of theinvention benefit from a more central transformation of entities intoconceptual data, thereby reducing the requirements of maintainingdecentral transformers.

[0336] Query Processor Modeller

[0337] A query processor modeller according to the invention is intendedfor establishment of the “transfer function” between the user, the webdata accessing machine and the data located in a web-based data source.The meaning of “transfer function” involves a data flow from the usertowards the data acessing machine and/or the web-based data sources.Moreover, the transfer function involves control of the flow of datafrom web-based data sources towards the web-data extraction machineand/or the user.

[0338] According to a preferred embodiment of the invention, thisfunctionality is referred to as a query process flow and the established“accessing machine” is referred to as a query processor. The queryprocessor will preferably be adapted for processing of a certainwell-defined domain, e.g. a car domain. It should be noted that somekind of overlapping between the domains may be acceptable in the sensethat one query processor may e.g. comprise query processor elementsaccessing data from different domains. Preferably, the domains should beseparated since a query processor should only deal with one domain.

[0339] The query processor will be defined in a query process graphbelow by means of a visual programming tool.

[0340]FIG. 8 illustrates a preferred embodiment of the inventioninvolving a visual programming tool for establishing the above-mentionedtransfer function by means of a query processor graph QPG.

[0341] According to a preferred embodiment of the invention, the queryprocessor modeller comprises a visual and programmable editor. Theillustrated editor facilitates the combination of a number of QueryProcessor Elements QPE into a query processor graph. The query processorelements may be of different types defined by their main functions.

[0342] Initially, a short introduction of query processor elements willbe provided.

[0343] An example of a query processor element QPE may e.g. be a robot,such as a robot query processor element RQPE. A robot query processorelement RQPE is adapted for accessing web-based data sources uponrequest. A single robot may typically be attached to one single datasource.

[0344] Evidently, a robot query processor element may also be adaptedfor reading only or writing only if suitable.

[0345] Another example of a query processor element QPE may e.g. be acache, such as a cache processor element CQPE. Such an element isadapted for returning a response to a query or it may guide the queryfurther on in the process if the cache contains no answer to the query.A further possibility is that the cache element CQPE returns a part ofthe response which may be established by means of the entities alreadycontained in the cache, and forward a query further upstream in theprocessor in order to establish the rest of the response.

[0346] A further example of a query processor element QPE may e.g. be aso-called mediator query processor element MQPE. This element is adaptedfor distributing an incoming query to other query processor elements andfor gathering the response returned by these queried processor elements,e.g. robots, and returning the answer back to the processor whichqueried the mediator MQPE.

[0347] Another query processor element may be of a trigger type, i.e. atrigger processor element TPE adapter, for triggering a certainoperation or a query.

[0348] The trigger processor element TQPE is adapted for initiating acertain action, e.g. an automatically scheduled initiation of a query,an automatic trigger processor element ATPE. Another applicable triggerprocessor element TPE may e.g. be a trigger adapted for initiation of aquery upon request by a user, i.e. a manually activated trigger MTPE. Itshould be noted that the latter trigger processors represent anothertype of query processor elements than the first. The trigger queryprocessor element is not activated by an incoming query but at its owninitiative. Hence, a manually operated trigger element MTPE may beregarded as an element including a user.

[0349] Turning now to FIG. 8, the figure illustrates a query processoradapted for processing a certain domain. According to the illustratedembodiment, the domain comprises three web-based data sources. Theillustrated query processor QP is constructed and monitored by means ofa visually programmed drag- and drop query processor graph QPG. Theestablishment of this query processor graph may also include theconfiguration of the individual query processor elements. Theconfiguration of e.g. a robot may thus be performed by means of anembedded robot modeller which may be activated via the Query ProcessorModeller.

[0350] The illustrated query processor graph comprises three robot queryprocessor elements RQPE1, RQPE2 and RQPE3.

[0351] Each robot is attached to a specific, dedicated data source, i.e.determined by the URL of the data source. Each robot is made automaticor semi-automatic by means of a robot modeller RM, both referred to asrobot maker and robot modeller RM in this application. The robots RQPE1,RQPE2 and RQPE3 are adapted for accessing, i.e. reading and/or writing,the associated data source (not shown) according to a read/write patterndefined and associated with the individual robots. This definedread/write pattern enables each robot to access the corresponding datasource. According to a preferred embodiment of the invention there is aone-to-one relationship between the robots and the data sources, i.e.one web-based data source is accessed by one robot only. The read/writepattern in the robot is typically highly specialized in order to fit thespecific data structure of the associated data source. It should benoted that web-based data structures are typically programmed andstructured independently, e.g. in HTML tables or other more or lessunforeseeable data structures.

[0352] The establishment of a read/write pattern may also be referred toas a creation of a robot.

[0353] Evidently, the invention offers different web-based data sourceowners the possibility of entering their data in a data structure whichis easy to access by the query processor. Such easy access may e.g. beprovided to the data source owners in the form of design requirements ifthey want their data source to be roboted. Likewise, the query processormay also include data-accessing robots, e.g. by featuring direct ODBCaccess to the database of the data owner. Thus, it will sometimes bepossible to assign a standard robot type to such generalized data sourceif so desired.

[0354] According to a preferred embodiment of the invention,requirements to the data source owner will be kept low, thereby offeringthe possibility of accessing numerous different data sources.

[0355] Turning now to the defined robot query processor element RQPE1,this robot is dedicated to a specific web-based data source andcommunicates with a query processor element in the form of a cacheCQPE1. The cache may be activated by a trigger TQPE1. This triggerelement TQPE1 may initiate a certain trigger-defined query subsequentlyperformed by the robot query processor element RQPE1.

[0356] The cache element CQPE1 may e.g. be provided as an encapsulationof the robot's data source. This direct and local pre-cache operation onone data source provides the possibility of reducing access time tocertain data of the data source operated by the robot RQPE1. Evidently,this facility is attractive for the purpose of bootstrapping the cachewith entities (data of the data structure of the data source) that areoften queried. The trigger element TQPE1 should typically ensure thatdata often queried are updated regularly according a preferredembodiment of the invention in order to avoid a completely empty cache.Evidently, this control may also be integrated in the cache CQPE1 withinthe scope of the invention. The cache CQPE1 is a coupled mediator queryprocessor element MQPE1. The functioning of the mediator MQPE1 will bedescribed below. Moreover, the cache element CQPE1 may e.g. be adaptedwith the purpose of reducing the load on the specific site roboted bythe robot element RQPE1 in a more strict sense, as the cache may beadapted for returning entities stored in the cache without querying therobot irrespective of the fact that the entities stored in the cache arenot completely updated. Thus, the local cache element CQPE1 may thus seta minimum interval for activation of the robot RQPE1, thereby ensuringthat each and every query not does necessarily result in a query of thedata source. This application of a cache may ensure that a certain siteis not overloaded by the robot.

[0357] A further robot query processor element RQPE2 is dedicated to aspecific web-based data source and communicates with a query processorelements in the form of a transformer TAQPE1. The transformer elementTAQPE1 is adapted for receiving a query from a user-activated queryelement MPTE located downstream to the located data sources locatedupstream. The illustrated transformer element TAQPE1 channels anunmodified query further on to the robot query processor element RQPE2.Subsequently, when the robot RQPE2 returns a reply to the query, theresponse may be modified by the transformer before being returned to theconnected mediator MQPE1. Such a modification may e.g. be established asa trivial mapping of km: 34 to be read as km: 34,000 or the like.Preferably, utilization of transformers for such purposes should be madewhen certain data sources, e.g. web-site, use certain terms deviatingfrom the general terms applied by other data source providers within thedomain.

[0358] The system comprises a further robot query processor elementRQPE3 dedicated to a specific web-based data source. This robot RQPE3 isdirectly coupled to the mediator MQPE1.

[0359] The mediator MQPE1 is applied for branching the query processpath into several different paths, e.g. three as illustrated. During thereturn path, the mediator collects the information obtained by thequeried robot branches and returns the data to a transformer elementTAQPE2.

[0360] This transformer element TAQPE2 defines a principle borderlinebetween the upstream robots RQPE1, RQPE2 and RQPE3 and the downstreamuser U as the transformer performs a transformation of data retrieved bythe robots into conceptual data according to a conceptual modelassociated with each robot. These conceptual data are handed over fromthe transformer element TAQPE2 to a cache query processor element CQPE2.Typically, the conceptual model should be common for all involvedelements dealing with entities in a conceptual manner.

[0361] The cache element CQPE2 may be regarded as the main storage meansfor the query processor QP intended for storage of the currently updatedentities retrieved by the robots of the query processor.

[0362] The nature of the cache may vary significantly from applicationto application. In some applications, the cache may comprise onlyrecently entered conceptual data, while caches in other applications maycomprise a more or less complete database of the entities comprised inthe data sources associated with the domain processor.

[0363] The cache CQPE2 may be activated by a trigger query processorTQPE2.

[0364] This trigger may e.g. be adapted for refreshing the cache CQPE2according to scheduled trigger criteria. The trigger criteria may bothbe established on the basis of user query statistics and/or statisticsassociated with data stored in the cache CQPE2.

[0365] The data contained in the cache CQPE2 are conceptual data.

[0366] The cache CQPE2 are coupled to a user interface represented by amanually operated trigger element MTPE located downstream of the queryprocessor graph via a tracking module TMO adapted for gathering andstoring data. The gathered data are used for keeping track of thehistory of data contained in the data sources of the domain and forestablishing and maintaining query statistics. This tracking module is acombination of a number of query processor elements QPE.

[0367] Basically, the module comprises a storing query processor elementSQPE1 adapted for writing data into a database query processor elementDBPE1. The database DBPE1 comprises entities retrieved from theassociated domain of data sources and the entities are stored accordingto a preferred storage model. The storage may also containhistory-describing data or data from which the entities may be deduced.The storing query processor element SQPE1 may be activated by both auser query or a trigger query TQPE3. The trigger query processor elementTQPE3 is intended to maintain and establish desired data, such as pricesof cars or the like and thereby offer the possibility of registering ifan entity comprised in a data source covered by the domain processor hasoffered another price etc.

[0368] Finally, the illustrated query processor path comprises atransformer element TAQPE3. This transformer element is primarilyresponsible for transforming conceptual data into storage data in thedatabase DBPE1.

[0369] Short explanations of some of the above-mentioned query processorelements will be provided below.

[0370] Generally, according to a preferred embodiment of the invention,the query processor elements should function without any knowledge ofthe context.

[0371] The Cache Query Processor Element

[0372] A cache query processor element according to the invention mayimplemented in many ways. Generally, the cache should (as a traditionalcache) contain some of the entities recently read from one or some ofthe data sources. The idea of applying a cache should generally be thatof reducing access time to the data sources. Generally, the cache may becontrolled in many ways, depending on the purpose. Thus, the cache maybe activated from time to time by an automatic trigger with the purposeof refreshing the content of the cache with respect to certain types ofentities. Triggering of the cache would then imply that the triggeredcache forwards a query to the relevant data sources of the domain,collects the response and writes the returned entities into the memory.Obviously, triggering of the cache may be constructed in numerous wayswithin the scope of the invention as long as the main purpose of thetriggering is to obtain the best possible performance of the currentapplication. Evidently, in some domains, the cache should not be appliedfor entities exceeding a certain age, e.g. 3 minutes, if the nature ofthe entities contained in the domain are changing quite often.

[0373] An example of advantageous triggering according to the inventionmay e.g. be that of triggering the cache with the purpose of refreshingthe cache with entities often queried by the users of the queryprocessor. This boot-strapping ensures that start-up time is reduced bymaintaining the often queried entities in the cache. The statisticalcontrol may therefore imply triggering of the cache which may varydynamically, i.e. be controlled by the user request.

[0374] A further possible approach may e.g. be triggering of the wholedomain once a day which means that all relevant data contained in alldata sources of the domain are read into the cache and that all data areupdated at least once a day. Evidently, according to the latterstrategy, the cache is controlled in a manner resembling a kind ofpersistent database.

[0375] The Transformer Query Processor Element

[0376] The transformer query processor element is basically an elementwhich may transform an incoming query or entity to another query orentity. Hence, the transformer works both ways: downstream and upstream.

[0377] Applicable transformer elements may e.g. be transformerstransforming raw extracted text-string entities received from upstream(e.g. from a robot) into entities in a conceptual representation of theentities read from the data-source according to a preferred embodimentof the invention.

[0378] Further possible transformer elements may e.g. be a transformerreceiving conceptual entities and outputting the entities according to adata storage model.

[0379] A further, and more simple transformer, may e.g. be a mutetransformer element, arranged in front of a robot or in a certainbranch. This mute may be adapted for blocking the entity or query streamin the respective branch. Such a mute transformer may e.g. beadvantageous if a certain robot must receive maintenance, therebyoffering the possibility to an operator of maintaining a query processorto modify or exchange a certain robot without modifying the queryprocess graph. Hence, a robot may be maintained without simultaneouslyreceiving a stream of queries. It should be noted that the transformersmay by arranged in many different positions in the query graph withinthe scope of the invention.

[0380] Trigger Query Processor Element

[0381] The trigger query processor element comprises means e.g. forinvoking a query in an element associated with the trigger. The triggermay then comprise a schedule adapted for defining fixed time intervalswhich determine when to query the associated element, e.g. a cache.Likewise, the trigger may comprise calculation algorithms adapted forcalculating suitable trigger conditions, e.g. when to query, and/or howto query. Therefore, the trigger may advantageously comprise statisticalevaluation means.

[0382] Mediator Query Processor Element

[0383] A mediator query processor element MQPE is adapted fordistributing an incoming query to other query processor elements and forgathering the response returned by these queried processor elements,e.g. robots, and returning the answer back to the processor whichinitially queried the mediator MQPE.

[0384] Hence, the mediator may show several different levels ofintelligence, from the somewhat simple and uncomplicated branch elementsimply distributing an incoming query to a number branching elements, toquite intelligent elements capable of distributing an incoming query tothe branches most likely comprising the queried entities.

[0385] A mediator may deal with data according to any representation,e.g. conceptual entities, storage entities or extraction entities.

[0386] Messenger Query Processor Element

[0387] Other possible types of query processor elements to be includedin the query processor graph may e.g. be MESQPE Messenger query processelements. The messenger elements MESQPE are adapted for monitoring theprocess of the individual QPE's or between the QPE's. These messengersmay e.g. be adapted for returning a processor's state-describingparameters to an operator responsible for the query processor or thequery processor element. Messengers may e.g. be adapted for providingstatistical material or fault warnings.

[0388] It should be noted that the conceptual building of the domainprocessor may be performed in many different ways. This means that theword “element” and the word “graph” should in no way restrict the scopeof the invention in the sense that the wording primarily reflects thefunctional understanding of the elements. Evidently, other types ofelements may be derived within the scope of the invention, e.g. elementscombined on the basis of the above-mentioned elements. Examples of suchpossible derivatives within the scope of the invention may e.g. be arobot processor comprising a transformer (i.e. the robots readextraction entities, transform the data to conceptual entities, andreturn the entities to a central control, e.g. a database; e.g. a cachecomprising a transformer, e.g. cache comprising a trigger, etc.)

[0389] A further advantageous messenger may e.g. be a messenger adaptedfor raising a flag to the operator managing the query processor when theentities to be transformed into conceptual data are not contained in areference product catalogue, thereby offering the operator thepossibility of updating such a catalogue locally or globally.

[0390] Other advantageous elements may e.g. be elements directly adaptedfor reading a well-known database, i.e. by means of ODBC drivers,thereby making it possible for extracted reading of “foreign” web-baseddata sources to be supplemented by readings from few or severaldatabases comprising entities included by the domain.

[0391] According to the invention, each of the present elements may beactivated by clicking on the element in the editor, therebyinitiating/activating the element-creating application. Hence, theRobotMaker application will be activated by double-clicking on aselected robot, e.g. RQPE1, and the Domain Modeller will be activatedwhen double-clicking on e.g. the transformer TAQPE2.

[0392] When the query processor graph QPG has been established, thegraph may be saved, thereby maintaining the properties of the completequery processor QP.

[0393] The structure and functioning of the individual query processorelements are defined by means of the domain modeller DMR and theRobotmaker RM. Evidently, some of the query processor elements aredomain independent in the sense that they may be included in the queryprocessor graph of several different types of query processors DP, e.g.trigger processor elements with little or no modification, whereas otherquery processor are somewhat domain specific. An example of a domainindependent query processor element may e.g. be the aforementioned mutetransformer element which may be applied by any desired domain withoutpre-modification.

[0394] It should be noted that the Query Processor Modeller may even,and preferably, include query processor execution tools included in theillustrated “view” setup. Such a setup may include the illustrated viewwhich, when in run mode, illustrates the running state of the queryprocessor and the individual elements. An example of such intuitiveprocessing is that the individual elements change color according to thestate, e.g. within a color range from white to red, depending on theload of the elements.

[0395] Moreover, the interface, e.g. the illustrated view, shouldpreferably visually illustrate basic on-off conditions, i.e. illustrateactively if an element is working properly, and whether entities aretransferred between the query processor elements and preferably whetherentities may actually be transferred between elements. The latterfeature may ease operation of the system significantly due to the factthat the absence of an entity flow between the elements does necessarilyindicate that a fault-condition has occurred simply because the elementis not queried.

[0396] Determination of a “clear road” between the elements may e.g. beestablished by forwarding dummy (testing) queries between elements atcertain intervals.

[0397] Moreover, the Query Processor Modeller may include submenusfacilitating specialized execution of the query processor. Such asubmenu is illustrated in FIG. 9, and it may e.g. be selected by the“run” drop down menu of the Query Processor Modeller.

[0398] Moreover, the Query Processor Modeller may feature specializedvisualization of certain groups of query processor elements. Thus, a“robot element” viewer may be activated, thereby offering the operatorthe possibility to concentrate fully on his task, e.g. maintenance ordesign of robot elements and thereby ignore elements dealt with by otheroperators.

[0399] It should be noted that a query processor according to theinvention may easily comprise several hundreds of robots.

[0400] Likewise, other designers may advantageously activate a “no robotview” while designing the main body of the query processor.

[0401] It should also be noted that the above-mentioned examples ofelements may be combined into groups of macro-elements, e.g. of a robotelement comprising a transformer, etc.

[0402]FIG. 9 illustrates a possible user interface of a domain processorDP. A domain processor is adapted for supporting maintenance of one orseveral query processors QP when established.

[0403] The illustrated user interface of a domain processor comprises atree-based structure monitoring area. One domain processor may controlexecution and maintenance of several different domains.

[0404] This area monitors a first level of node-represented servers NL1.This level illustrates different servers applied, WebServer,RobotServer1, RobotServer2. A second node level NL2 shows the currentdomains controlled by the domain server, e.g. Cars, Yachts and PC's. Athird level NL3 illustrates different selectable query processorstate-indicating functions, e.g. queries, triggers and messages. Thefunction Messages has been selected in the illustrated view.

[0405] It should be noted that the term server referred to in level 1NL1 may both reflect a physical location of a query processor withrespect to a server, or it may refer to a kind of virtual servercomprising several different servers, each processing their part (e.g.element or groups of elements) of the query processor.

[0406] Moreover, the illustrated viewer comprises a message viewing areaMVA adapted for viewing messages forwarded automatically by e.g.different unique elements of a query process path or groups of elements.The attributes of listed messages may e.g. be chosen as the illustratedTitle, Date, Priority, Origin Element.

[0407] The viewer may moreover facilitate a filtering of the individualelements of the original element. Hence, an operator may e.g. establisha filtering of messages from a certain element, Original Element, or ofgroups of elements, e.g. mediators or transformers.

[0408] Moreover, the viewer comprises a message detail window MDW. Thisviewer may illustrate details about a single message or groups ofselected messages in the messages view area MVA. Each message may e.g.be associated with a startup-facility with the purpose of activating theeditor or editors associated with the individual message.

[0409] A query element program, e.g. a robot editor, may be starteddirectly from the domain processor DP, e.g. by automatically importingthe data from an element selected in the viewer such as a specificrobot.

1. Domain processor (DP) comprising at least one robot modeller (RM) at least one domain modeller (DMR), at least one Query Processor Modeller (QPM) said robot modeller (RM) comprising means for modelling at least one computer-based robot (R), said at least one robot (R) being adapted for accessing at least one web-based data source (DS), said at least one data source (DS) comprising entities comprised in a predefined domain (D), said at least one domain modeller (DMR) comprising means for modelling at least one domain model (DM) associated with at least one chosen domain, said domain model (DM) comprising at least one extraction model (EM) and at least one storage model (STM), means for establishing at least one extraction model (EM) associated with a chosen domain, means for establishing at least one storage model (STM) associated with said chosen domain, said at least one Query Processor Modeller (QPM) comprising means for selecting at least two Query Processor elements (QPE) from a set of predefined query processor elements (QPE), means for combining at least two of the selected Query Processor elements (QPE), means for executing said associated query processor elements on at least one computer system (CS), at least one of said query processor elements (QPE) of associated query processor elements being a Robot query processor Element (RQPE) adapted for accessing at least one web-based data source (DS).
 2. Domain processor (DP) according to claim 1, wherein the domain processor (DP) comprises at least one query processor maintenance manager (QMM), said at least one query processor maintenance manager (QMM) comprising means for executing at least one query processor (QP) established by the domain processor.
 3. Robot modeller (RM) comprising means for modelling at least one computer-based robot (R), said at least one robot (R) being adapted for accessing at least one web-based data source (DS), said at least one data source (DS) comprising entities comprised in a predefined domain (D).
 4. Domain modeller (DMR) comprising means for modelling at least one domain model (DM) associated with at least one chosen domain, said domain model (DM) comprising at least one extraction model (EM) and at least one storage model (STM), means for establishing at least one extraction model (EM) associated with a chosen domain, means for establishing at least one storage model (STM) associated with said chosen domain,
 5. Domain modeller (DMR) according to claim 4, wherein said domain modeller comprises means for establishing reference mapping between extracted data obtained according to said extraction model (EM) and a conceptual representation of said data.
 6. Domain modeller (DMR) according to claim 4 or 5, wherein said reference mapping defines a set of reference entities describing a number of entities (E), said entities having attributes.
 7. Domain modeller (DMR) according to claim 4 to 6, wherein said domain modeller (DMR) comprises means for establishing at least one language domain dictionary (LDD).
 8. Domain modeller (DMR) according to claims 4-7, wherein said at least one language domain dictionary (LDD) maps the language of the extracted entities into the general language of the query processor (QP).
 9. Domain modeller (DMR) according to claims 4-6, wherein said domain modeller (DMR) comprises means for establishing a set of reference recognition patterns.
 10. Query Processor Modeller (QPM) comprising means for selecting at least two Query Processor elements (QPE) from a set of predefined query processor elements (QPE), means for combining at least two of the selected Query Processor elements (QPE), means for executing said associated query processor elements on at least one computer system (CS), at least one of said query processor elements (QPE) of the associated query processor elements being a Robot query processor Element (RQPE) adapted for accessing at least one web-based data source (DS).
 11. Query Processor Modeller (QPM) according to claim 10, wherein the Query Processor Modeller comprises a graphical user interface (GUI) in the form of a visual programming tool.
 12. Query Processor Modeller (QPM) according to claim 10 or 11 wherein said set of query processor elements (QPE) comprises at least two different types of query processor elements, at least one type being a robot query processor element (RQPE) and at least one type being a trigger query processor element (TQPE).
 13. Query processor maintenance manager (QMM) comprising means for executing at least one query processor (QP) established by the domain processor.
 14. Query processor maintenance manager (QMM) according to claim 13, wherein said maintenance manager (QMM) comprises means for monitoring the state of at least one query processor element (QPE) or the performance of at least one query processor element (QPE).
 15. Query processor maintenance manager (QMM) according to claim 13 or 14, wherein said domain processor maintenance manager (QMM) comprises means for evaluating the data flow between query processor elements (QPE) of a query processor path.
 16. Query processor maintenance manager (QMM) according to claims 13-15, wherein said domain processor maintenance manager (QMM) comprises means for running and visual monitoring of the individual modules of a query processor.
 17. Query processor maintenance manager (QMM) according to claims 13-16, wherein said domain processor maintenance manager (QMM) comprises means for running and visual monitoring of a query processor (QP) on element basis.
 18. Web-robot said robot comprising means for extracting information from web-based data sources (DS) in dependency of at least one extraction model (EM), said at least one extraction model comprising reference data structures defining entities and/or entity structures of data sources in a domain.
 19. Web-robot according to claim 18, said robot comprising at least one exchangeable plug-in, said plug-in comprising retrieving routines adapted for reading knowledge stored in said extraction model, said knowledge preferably being domain-specific.
 20. Web-robot according to claim 18 or 19, wherein said plug-in defines a reference mapping between extracted data obtained according to said extraction model (EM) and a conceptual representation of said data.
 21. Web-robot according to claims 18-20, wherein said extraction model (EM) is shared between at least two robots.
 22. Query processor (QP), said query processor (QP) comprising a set of web-based data sources (DS), wherein at least two of said data sources (DS) comprise entities according to a domain model (DM), said query processor (QP) comprising at least three query processor element (QPE), at least two of said query processor elements (QPE) comprising a robot (RQPE) said robot (RQPE) being attached to at least one data source (DS) said robot comprising means for accessing information from the at least one data source (DS) according to at least one extraction model (EM) associated with said robot (RQPE), at least one of said query processor elements (QPE) comprising a trigger (TQPE) said trigger query processor element (TQPE) comprising means for establishing a query.
 23. Query processor (QP) according to claim 22, wherein at least one of the query processor elements (QPE) comprises a transformer query processor element (TAQPE), a messenger query processor element (MESQPE) or a mediator query processor element (MQPE).
 24. Method of establishing at least one query processor (QP), said query processor (QP) comprising a set of web-based data sources (DS), wherein at least two of said data sources (DS) comprise entities according to a domain model (DM), said query processor (QP) comprising at least three query processor element (QPE), at least two of said query processor elements (QPE) comprising a robot (RQPE), said robot comprising means for accessing information from the at least one data source (DS) according to at least one extraction model (EM) associated with said robot (RQPE), at least one of said query processor elements (QPE) comprising a trigger (TQPE), said trigger query processor element (TQPE) comprising means for establishing a query, said method comprising the step of attaching at least one selected robot query processor element (RQPE) to at least one of the data sources (DS) of the domain, combining the selected query processor elements into a query processor (QP) by means of a graphical user interface (GUI).
 25. Method of establishing at least one query processor (QP) according to claim 24, wherein said graphical user interface (GUI) defines a query processor element path visually on a drag- and drop basis.
 26. Method of establishing at least one query processor (QP) according to claim 24 or 25, wherein at least one of the combined query processor elements (QPE) comprises a transformer query processor element (TAQPE), a messenger query processor element (MESQPE) or a mediator query processor element (MQPE).
 27. Method of establishing at least one query processor (QP), said query processor comprising means for accessing data from web-based data sources (DS) of a domain by means at least one user interface (UI) said method comprising the steps of selecting a number of query processor element (QPE) at least one of said selected query processor elements (QPE) being a robot query processor element (RQPE), at least one of said selected query processor elements (QPE) being a trigger query processor element (TQPE), attaching at least one selected robot query processor element (RQPE) to at least one of the data sources (DS) of the domain, combining the selected query processor elements into at least one query path defining the data flow in the query processor (QP) between the user interface (UI) and the web-based data sources of the domain, said method comprising a further step of customizing the at least one individual robot query processor element (RQPE) to the corresponding attached data sources (DS), customizing at least one of the trigger query processor elements (TRPE) to the query processor (QP).
 28. Method of establishing at least one query processor (QP) according to claim 27, wherein at least one of the combined query processor elements (QPE) comprises a transformer query processor element (TAQPE), a messenger query processor element (MESQPE) or a mediator query processor element (MQPE).
 29. Method of extracting data from a web-based data source (DS), said method comprising the steps of identifying and reading attributes and entities of a web-based data source, converting the read entities into instances of conceptual entities, verifying whether the read instances correspond with an entity reference base, (ERB).
 30. Method of extracting data from a web-based data source according to claim 29, whereby the read instances are verified to determine whether they correspond with an entity reference base, (ERB) on the basis of entities represented in said conceptual entity-representing format.
 31. Method of extracting data from a web-based data source according to claim 29 or 30, whereby the verified instances are modified according to the entity reference base (ERB) by adding information associated with said instances corresponding to said entity reference base.
 32. Method of extracting data from a web-based data source according to claims 29-31, said method comprising correction of the verified instances according to the entity reference base (ERB) by correcting information associated with said instances corresponding to said entity reference base.
 33. Method of establishing a query processor, said query processor being adapted for accessing data on at least two different web-based data sources, selecting at least two predefined query processor elements (QPE), combining the selected query processor elements into a desired query processor structure.
 34. Method of establishing a query processor according to claim 33, said at least two predefined query processor elements having different functional characteristics.
 35. Method of establishing a query processor according to claims 33 and 34, said method comprising the step of modifying the selected query processor elements according to the data structure of said web-based data sources.
 36. Method of establishing a query processor according to claims 33-35, wherein said modification of the selected query processor elements comprises at least one plug-in software module, said at least one plug-in defining domain-specific properties of said element.
 37. Method of establishing a domain-accessing routine, said domain comprising a plurality of web-based data sources, said method comprising the steps of establishing at least one robot ( ) adapted for retrieving entities stored on said plurality of web-based data sources establishing at least one reference catalogue, establishing at least one procedure of verifying the retrieved entities by comparing the read entities with the at least one reference catalogue.
 38. Method of establishing a domain-accessing routine according to claim 37, said method comprising the steps of establishing at least one storage means establishing a data-exchanging interface between said at least one robot and at least one storage means.
 39. Method of establishing a domain-accessing routine according to claims 37-38, wherein said reference catalogue is a product catalogue.
 40. Method of establishing a domain-accessing routine according to claims 37-39, wherein said established procedure of verification comprises modification of the retrieved entities if the verification procedure indicates or proves that a read entity is not valid according to the at least one reference catalogue.
 41. Query processor maintenance manager (QMM) comprising at least one domain processor user interface (DPUI) said manager (QMM) comprising means for evaluating different modules of at least one query processor (QP), said means for evaluating different sub-routines of said query processor comprising means for monitoring the state of at least on query processor element (QPE)
 42. Query processor maintenance manager (QMM) according to claim 41, said processor comprising means for automatically forwarding messages to said at least one query processor user interface (DPUI) when certain predefined conditions are met.
 43. Query processor maintenance manager (QMM) according to claim 41 or 42 said manager (QMM) comprising means for modifying individual query processor elements/sub-routines.
 44. Query processor maintenance manager (QMM) according to claims 41-43, said manager (QMM) comprising means for modifying the query flow in the query processor during execution of the query processor. 